1,607 research outputs found

    Privacy Preserving Clustering with Constraints

    Get PDF
    The k-center problem is a classical combinatorial optimization problem which asks to find k centers such that the maximum distance of any input point in a set P to its assigned center is minimized. The problem allows for elegant 2-approximations. However, the situation becomes significantly more difficult when constraints are added to the problem. We raise the question whether general methods can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that respects one constraint more. Our constraint of choice is privacy: Here, we are asked to only open a center when at least l clients will be assigned to it. We show how to combine privacy with several other constraints

    On the Cost of Essentially Fair Clusterings

    Get PDF
    Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair kk-center problem and a O(t)O(t)-approximation for the fair kk-median problem, where tt is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair kk-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair kk-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives kk-center, kk-supplier, kk-median, kk-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed

    Wichtige Faktoren des Wintergetreideertrages nach Körnerleguminosen in der Ökolandbaupraxis

    Get PDF
    With the aim of identifying key factors for cereal yield and mineral nitrogen content, 87 fields of wheat, spelt, rye, triticale and barley on 31 organic farms were evaluated from 2009 to 2011. According to multiple linear regression analysis, factors leading to high cereal yields were high mineral nitrogen content, high soil water storage and high cereal coverage in spring; also deep soil, low weed pressure, high available phosphorous content and low growing frequency in previous years. Key factors for high mineral nitrogen content in spring included low cereal coverage in spring, vegetation cover in winters of previous years, high soil organic matter and silt content, low C/N ratio, low plough depth, legume intercropping before sowing and use of manure or slurry nitrogen. An additional survey year with additional parameters will be conducted before the project’s official end

    Faktoren des Körnererbsenertrages in der Ökolandbaupraxis

    Get PDF
    With the aim of identifying key factors for field pea yield, 52 fields on 21 organic farms were evaluated from 2009 to 2011. Criteria for evaluation were soil variables, crop data, management and field history. The average pea yield of all farms was 2.1 t/ha including 5 fields that did not harvest due to inadequate growth and/or high weed pressure. Based on measurements in three control points per field, yield was 65 % higher when harvesting by hand. The main causes were heterogeneous crops and high yield losses through combine harvesting. According to multiple linear regression analysis of yields in control points, factors leading to high pea yield included deep soil, high available phosphorous content, low clay content and longer intervals between pea crops. Reduction of weed pressure and deep sowing (up to 6 cm) were the main crop management variables effecting yield. An additional survey year with additional parameters will be conducted before the project’s official end

    A Theory of Visibility Measures in the Dissociation Paradigm

    Full text link
    Research on perception without awareness primarily relies on the dissociation paradigm, which compares a measure of awareness of a critical stimulus (direct measures) with a measure indicating that the stimulus has been processed at all (indirect measure). We argue that dissociations between direct and indirect measures can only be demonstrated with respect to the critical stimulus feature that generates the indirect effect, and the observer's awareness of that feature, the critical cue. We expand Kahneman's (1968) concept of criterion content to comprise the set of all cues than an observer actually uses to perform the direct task. Different direct measures can then be compared by studying the overlap of their criterion contents and their containment of the critical cue. Because objective and subjective measures may integrate different sets of cues, one measure generally cannot replace the other without sacrificing important information. Using a simple mathematical formalization, we redefine and clarify the concepts of validity, exclusiveness, and exhaustiveness in the dissociation paradigm, show how dissociations among different awareness measures falsify simple theories of consciousness, and formulate the demand that theories of visual awareness should be sufficiently specific to explain dissociations among different facets of awareness.Comment: v1: initial upload. v2: added arXiv identifier. v3: corrected an error in mathematical notation in the "definition (iii)" section. v5: adds reference to the published article. Note that the manuscript responding to this preprint has now been published in Psychonomic Bulletin & Review and should be cited preferentiall

    Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means

    Get PDF
    We study k-clustering problems with lower bounds, including k-median and k-means clustering with lower bounds. In addition to the point set P and the number of centers k, a k-clustering problem with (uniform) lower bounds gets a number B. The solution space is restricted to clusterings where every cluster has at least B points. We demonstrate how to approximate k-median with lower bounds via a reduction to facility location with lower bounds, for which O(1)-approximation algorithms are known. Then we propose a new constrained clustering problem with lower bounds where we allow points to be assigned multiple times (to different centers). This means that for every point, the clustering specifies a set of centers to which it is assigned. We call this clustering with weak lower bounds. We give an 8-approximation for k-median clustering with weak lower bounds and an O(1)-approximation for k-means with weak lower bounds. We conclude by showing that at a constant increase in the approximation factor, we can restrict the number of assignments of every point to 2 (or, if we allow fractional assignments, to 1+?). This also leads to the first bicritera approximation algorithm for k-means with (standard) lower bounds where bicriteria is interpreted in the sense that the lower bounds are violated by a constant factor. All algorithms in this paper run in time that is polynomial in n and k (and d for the Euclidean variants considered)

    Data Mining und wissenschaftliche Forschung – de lege lata und de lege ferenda

    Get PDF
    Der Vorentwurf zum neuen Urheberrechtsgesetz enthält mit Art. 24d E-URG eine Bestimmung, welche die Verwendung von Werken zu wissenschaftlichen Zwecken regelt. So soll die Vervielfältigung und Bearbeitung von urheberrechtlich geschützten Werken künftig zulässig sein, sofern dies durch die Anwendung eines technischen Verfahrens bedingt ist. Die neue Bestimmung zielt auf Fälle des so genannten Text and Data Mining ab, d.h. der computergestützten Suche, Analyse und Vernetzung von Daten mit dem Ziel, neue Erkenntnisse und Zusammenhänge zu erhalten. Es stellt sich die Frage, ob die in Art. 24d E-URG genannten Werknutzungen nicht bereits nach geltendem Recht zulässig sind. Der vorliegende Beitrag beschreibt zunächst den Vorgang des Text und Data Mining (I.) und dessen urheberrechtliche Relevanz (II.), bevor die möglicherweise einschlägigen Schranken des Urheberrechts untersucht werden (III.). Schliesslich wird auf die neue Bestimmung Art. 24d E-URG eingegangen (IV.)
    • …
    corecore